-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Change the default log config to reduce disk I/O and storage #8040
Conversation
This batches up writes to the filesystem, which is more efficient for disk I/O. This means that it can take some time for logs to get written to disk. Note that ERROR logs (and above) immediately flush the buffer. This only effects new installs, as we only write the log config if started with `--generate-config` (in the same way we do for generating signing keys).
This hopefully reduces the amount of logs kept for new servers. Keeping the last 1GB of logs is likely overkill for new servers, but equally may not be enough for busy ones. Instead, we keep the last four days worth of logs, enough so that admins can investigate any problems that happened over e.g. a long weekend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's another footgun here, which is #3471.
Basically if there is an error writing to the logfile, then that error gets written back at logger.error
, which makes the MemoryLogger try to flush again, so you stackoverflow and everything is awful.
I've long argued that redirecting stderr to the logging system is a stupid idea, but it doesn't seem to want to die, so in its absence the options seem to be (a) reimplement MemoryLogger to avoid the infinite loop (b) special-case the twisted
logger so that it goes straight to file
.
we went with (b) on matrix.org.
We do this to prevent foot guns. The default config uses a MemoryFilter, but users are free to change to logging to files directly. If they do then they have to ensure to set the `filters: [context]` on the right handler, otherwise records get written with the wrong context. Instead we move the logic to happen when we generate a record, which is when we *log* rather than *handle*. (It's possible to add filters to loggers in the config, however they don't apply to descendant loggers and so they have to be manually set on *every* logger used in the code base)
1320840
to
c2fc81e
Compare
docs/sample_log_config.yaml
Outdated
buffer: | ||
class: logging.handlers.MemoryHandler | ||
target: file | ||
capacity: 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kind of wonder if 100 is a bit extreme here for a new server?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as in, too high? maybe a bit.
probably worth adding a comment here saying what the number means, so people can tweak it if they want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, even jki.re logs at less than 30 lines a second so a brand new server would probably not actually flush very often at all, which I can imagine will cause confusion. I'm somewhat tempted to lower it to something like ten, which would still be a huge improvement.
We don't WARN too frequently, so this is fine and ensures that we log things like "STARTING" immediately.
I've also changed the default to flush on |
synapse/config/logger.py
Outdated
@@ -55,34 +55,43 @@ | |||
format: '%(asctime)s - %(name)s - %(lineno)d - %(levelname)s - \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw: remember that the docker image has its own default log config; please update it to keep it in sync.
docs/sample_log_config.yaml
Outdated
buffer: | ||
class: logging.handlers.MemoryHandler | ||
target: file | ||
capacity: 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as in, too high? maybe a bit.
probably worth adding a comment here saying what the number means, so people can tweak it if they want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, there's more
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm modulo more doc nitpicking
docs/sample_log_config.yaml
Outdated
# By default buffer logs in memory before logging to file. Replace "buffer" | ||
# with "console" to log to stderr instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# By default buffer logs in memory before logging to file. Replace "buffer" | |
# with "console" to log to stderr instead. | |
# Write logs to the `buffer` handler, which will buffer them together in memory, | |
# then write them to a file. | |
# | |
# Replace "buffer" with "console" to log to stderr instead. (Note that you'll | |
# also need to update the configuation for the `twisted` logger above, in | |
# this case.) | |
# |
oh also: you still haven't updated the log conf for the docker image. |
As per |
Synapse 1.19.0rc1 (2020-08-13) ============================== Removal warning --------------- As outlined in the [previous release](https://github.com/matrix-org/synapse/releases/tag/v1.18.0), we are no longer publishing Docker images with the `-py3` tag suffix. On top of that, we have also removed the `latest-py3` tag. Please see [the announcement in the upgrade notes for 1.18.0](https://github.com/matrix-org/synapse/blob/develop/UPGRADE.rst#upgrading-to-v1180). Features -------- - Add option to allow server admins to join rooms which fail complexity checks. Contributed by @lugino-emeritus. ([\#7902](#7902)) - Add an option to purge room or not with delete room admin endpoint (`POST /_synapse/admin/v1/rooms/<room_id>/delete`). Contributed by @dklimpel. ([\#7964](#7964)) - Add rate limiting to users joining rooms. ([\#8008](#8008)) - Add a `/health` endpoint to every configured HTTP listener that can be used as a health check endpoint by load balancers. ([\#8048](#8048)) - Allow login to be blocked based on the values of SAML attributes. ([\#8052](#8052)) - Allow guest access to the `GET /_matrix/client/r0/rooms/{room_id}/members` endpoint, according to MSC2689. Contributed by Awesome Technologies Innovationslabor GmbH. ([\#7314](#7314)) Bugfixes -------- - Fix a bug introduced in Synapse v1.7.2 which caused inaccurate membership counts in the room directory. ([\#7977](#7977)) - Fix a long standing bug: 'Duplicate key value violates unique constraint "event_relations_id"' when message retention is configured. ([\#7978](#7978)) - Fix "no create event in auth events" when trying to reject invitation after inviter leaves. Bug introduced in Synapse v1.10.0. ([\#7980](#7980)) - Fix various comments and minor discrepencies in server notices code. ([\#7996](#7996)) - Fix a long standing bug where HTTP HEAD requests resulted in a 400 error. ([\#7999](#7999)) - Fix a long-standing bug which caused two copies of some log lines to be written when synctl was used along with a MemoryHandler logger. ([\#8011](#8011), [\#8012](#8012)) Updates to the Docker image --------------------------- - We no longer publish Docker images with the `-py3` tag suffix, as [announced in the upgrade notes](https://github.com/matrix-org/synapse/blob/develop/UPGRADE.rst#upgrading-to-v1180). ([\#8056](#8056)) Improved Documentation ---------------------- - Document how to set up a client .well-known file and fix several pieces of outdated documentation. ([\#7899](#7899)) - Improve workers docs. ([\#7990](#7990), [\#8000](#8000)) - Fix typo in `docs/workers.md`. ([\#7992](#7992)) - Add documentation for how to undo a room shutdown. ([\#7998](#7998), [\#8010](#8010)) Internal Changes ---------------- - Reduce the amount of whitespace in JSON stored and sent in responses. Contributed by David Vo. ([\#7372](#7372)) - Switch to the JSON implementation from the standard library and bump the minimum version of the canonicaljson library to 1.2.0. ([\#7936](#7936), [\#7979](#7979)) - Convert various parts of the codebase to async/await. ([\#7947](#7947), [\#7948](#7948), [\#7949](#7949), [\#7951](#7951), [\#7963](#7963), [\#7973](#7973), [\#7975](#7975), [\#7976](#7976), [\#7981](#7981), [\#7987](#7987), [\#7989](#7989), [\#8003](#8003), [\#8014](#8014), [\#8016](#8016), [\#8027](#8027), [\#8031](#8031), [\#8032](#8032), [\#8035](#8035), [\#8042](#8042), [\#8044](#8044), [\#8045](#8045), [\#8061](#8061), [\#8062](#8062), [\#8063](#8063), [\#8066](#8066), [\#8069](#8069), [\#8070](#8070)) - Move some database-related log lines from the default logger to the database/transaction loggers. ([\#7952](#7952)) - Add a script to detect source code files using non-unix line terminators. ([\#7965](#7965), [\#7970](#7970)) - Log the SAML session ID during creation. ([\#7971](#7971)) - Implement new experimental push rules for some users. ([\#7997](#7997)) - Remove redundant and unreliable signature check for v1 Identity Service lookup responses. ([\#8001](#8001)) - Improve the performance of the register endpoint. ([\#8009](#8009)) - Reduce less useful output in the newsfragment CI step. Add a link to the changelog section of the contributing guide on error. ([\#8024](#8024)) - Rename storage layer objects to be more sensible. ([\#8033](#8033)) - Change the default log config to reduce disk I/O and storage for new servers. ([\#8040](#8040)) - Add an assertion on `prev_events` in `create_new_client_event`. ([\#8041](#8041)) - Add a comment to `ServerContextFactory` about the use of `SSLv23_METHOD`. ([\#8043](#8043)) - Log `OPTIONS` requests at `DEBUG` rather than `INFO` level to reduce amount logged at `INFO`. ([\#8049](#8049)) - Reduce amount of outbound request logging at `INFO` level. ([\#8050](#8050)) - It is no longer necessary to explicitly define `filters` in the logging configuration. (Continuing to do so is redundant but harmless.) ([\#8051](#8051)) - Add and improve type hints. ([\#8058](#8058), [\#8064](#8064), [\#8060](#8060), [\#8067](#8067))
* commit 'db131b6b2': Change the default log config to reduce disk I/O and storage (#8040) Implement login blocking based on SAML attributes (#8052) Add an assertion on prev_events in create_new_client_event (#8041) Typo Lint why mypy why Lint Incorporate review Incorporate review Fix PUT /pushrules to use the right rule IDs Back out the database hack and replace it with a temporary config setting Fix cache name Fix cache invalidation calls Lint Changelog Implement new experimental push rules with a database hack to enable them
This was originally added when we first added a `MemoryHandler` to the default log config back in #8040, to ensure that we didn't explode with an infinite loop if there was an error formatting the logs. Since then, we made additional improvements to logging which make this workaround redundant. In particular: * we no longer attempt to log un-UTF8-decodable byte sequences, which were the most likely cause of an error in the first place. * #8268 ensures that in the unlikely case that there *is* an error, it won't cause an infinite loop.
This was originally added when we first added a `MemoryHandler` to the default log config back in #8040, to ensure that we didn't explode with an infinite loop if there was an error formatting the logs. Since then, we made additional improvements to logging which make this workaround redundant. In particular: * we no longer attempt to log un-UTF8-decodable byte sequences, which were the most likely cause of an error in the first place. * #8268 ensures that in the unlikely case that there *is* an error, it won't cause an infinite loop.
The now-removed comment was introduced in #8040, and referenced a "twisted" logging handler. That handler has since been removed from the file, so the associated comment is no longer necessary.
See the comments in the config.
Fixes #8018
Note: I'm happy to bikeshed some of the values here.